翻訳と辞書
Words near each other
・ Spider plant
・ Spider Project
・ Spider Queen
・ Spider Riders
・ Spider Robinson
・ Spider Rockets
・ Spider roll
・ Spider Sabich
・ Spider silk
・ Spider Smile
・ Spider Stacy
・ Spider Systems
・ Spider taxonomy
・ Spider tortoise
・ Spider toxin
Spider trap
・ Spider wasp
・ Spider web
・ Spider Web (disambiguation)
・ Spider Webb
・ Spider Webb (jazz drummer)
・ Spider Widow
・ Spider Wilhelm
・ Spider Woman (disambiguation)
・ Spider Woman (song)
・ Spider! (TV series)
・ Spider's Den Cave
・ Spider's Web (film)
・ Spider's Web (novel)
・ Spider's Web (play)


Dictionary Lists
翻訳と辞書 辞書検索 [ 開発暫定版 ]
スポンサード リンク

Spider trap : ウィキペディア英語版
Spider trap

A spider trap (or crawler trap) is a set of web pages that may intentionally or unintentionally be used to cause a web crawler or search bot to make an infinite number of requests or cause a poorly constructed crawler to crash. Web crawlers are also called web spiders, from which the name is derived. Spider traps may be created to "catch" spambots or other crawlers that waste a website's bandwidth. They may also be created unintentionally by calendars that use dynamic pages with links that continually point to the next day or year.
Common techniques used are:
* creation of indefinitely deep directory structures like
http://foo.com/bar/foo/bar/foo/bar/foo/bar/.....

* Dynamic pages that produce an unbounded number of documents for a web crawler to follow. Examples include calendars and algorithmically generated language poetry.〔Neil M Hennessy. "(The Sweetest Poison, or The Discovery of L=A=N=G=U=A=G=E Poetry on the Web )". Accessed 2013-09-26.〕
* documents filled with a large number of characters, crashing the lexical analyzer parsing the document.
* documents with session-id's based on required cookies.
There is no algorithm to detect all spider traps. Some classes of traps can be detected automatically, but new, unrecognized traps arise quickly.
==Politeness==

A spider trap causes a web crawler to enter something like an infinite loop, which wastes the spider's resources, lowers its productivity, and, in the case of a poorly written crawler, can crash the program. Polite spiders alternate requests between different hosts, and don't request documents from the same server more than once every several seconds, meaning that a "polite" web crawler is affected to a much lesser degree than an "impolite" crawler.
In addition, sites with spider traps usually have a robots.txt telling bots not to go to the trap, so a legitimate "polite" bot would not fall into the trap, whereas an "impolite" bot which disregards the robots.txt settings would be affected by the trap.

抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)
ウィキペディアで「Spider trap」の詳細全文を読む



スポンサード リンク
翻訳と辞書 : 翻訳のためのインターネットリソース

Copyright(C) kotoba.ne.jp 1997-2016. All Rights Reserved.